Natural language user interface
Natural Language User Interfaces (LUI) are a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications.
In interface design natural language interfaces are sought after for their speed and ease of use, but most suffer the challenges to understanding wide varieties of ambiguous input.[1] Natural language interfaces are an active area of study in the field of natural language processing and Computational linguistics. An intuitive general Natural language interface is one of the active goals of the Semantic Web.
It is important to note that text interfaces are 'natural' to varying degrees, and that many formal (un-natural) programming languages incorporate idioms of natural human language. Likewise, a traditional keyword search engine could be described as a 'shallow' Natural language user interface.
Overview
A natural language search engine would in theory find targeted answers to user questions (as opposed to keyword search). For example, when confronted with a question of the form 'which U.S. state has the highest income tax?', conventional search engines ignore the question and instead do a search on the keywords 'state, income and tax'. Natural language search, on the other hand, attempts to use natural language processing to understand the nature of the question and then to search and return a subset of the web that contains the answer to the question. If it works, results would have a higher relevance than results from a keyword search engine.
From a commercial standpoint, advertising on the results page could also be more relevant and could have a higher revenue potential than that of keyword search engines.
History
Along the history the natural languages have evolved in parallel with the development and evolution of the human species. In recent years, applications designers have tried to promote the communication between humans and machines which have been included voice recognition techniques. Today the field of natural language recognition is working to improve outcomes, overcoming the different difficulties which are discussed below.
Prototype Nl interfaces had already appeared in the late sixties and early seventies.[2]
- Lunar, a natural language interface to a database containing chemical analyses of Apollo-11 moon rocks by William A. Woods.
- Chat-80 transformed English questions into Prolog expressions, which were evaluated against the Prolog database. The code of Chat-80 was circulated widely, and formed the basis of several other experimental Nl interfaces.
- Janus is also one of the few systems to support temporal questions.
- Intellect from Trinzic (formed by the merger of AICorp and Aion).
- Bbn’s Parlance built on experience from the development of the Rus and Irus systems.
- IBM Languageaccess
- Q&A from Symantec.
- Datatalker from Natural Language Inc.
- Loqui from Bim.
- English Wizard from Linguistic Technology Corporation.
Natural Language Processing
Difficulties of recognition
Recognition systems can be divided in two main types. Pattern recognition systems , this one compares patterns with other patterns already known and classified to determine the similarity. On the other hand we have the phonetic systems this one use the knowledge of the human body (speech production and hearing) to compare language features (phonetics such as vowel sounds). More modern systems focus on pattern recognition approach, combining nicely with current computing techniques and tends to have greater accuracy.
There are some factors[3] that make difficult these processes, because they affect the treaty of the signal and therefore the recognition. Some of them are:
- The inter-speaker and intra-speaker phonetic variation : The inter-speaker variation, is the results when you are issuing a specific sequence of words with the same style of speech, and without geographic or social differences, but still there are variations in the language. On the other hand intra-speaker variation, study what changes in each person when he is speaking spontaneously or reading.
- The styles of speech: Among all the speakers there is a wide range of styles that are indexed to modify the speech intelligibility. "The style depends on the speaker's attention to the characteristics of their own language production" (William Labov).
- The "disfluencies" in spontaneous speech: In this section we can find a wide range of variations that change the flow of linguistics. Among them are the pauses, repetitions, truncated words, vowel lengthening, interruptions, unfinished sentences and even variations in speed.
- The characteristics of the environment: Finally we have the external factor, because the environmental changes can significantly interfere with the signal processing. In this case we find noise that can distort or mask, and surround sound changes that can modify the signal temporarily.
Signal Processing
The implementation of a natural language recognition system[4], involves the treatment of acoustic signal through different blocks that will help us to extract the necessary features to implement the system. This process can be summarize with the following sections:
1. The first step is the capture of the voice signal. It uses a microphone through a CAD converter (Analogue / Digital Converter) converts the acoustic signal into an electrical signal which one performs the extraction parameters. In this step there is an additional difficulty caused by the nonlinearity and frequency loss introduced by the system microphone / converter.
2. The next stage is the segmentation and labeling here the system try to find the stable regions where the characteristics are constant. One of the most used techniques is the utilization of overlap between the windowing, to avoid parts without analyzing . At this level also are typically applied standarization and pre-emphasis filters, which ones prepares the signal for processing.
3. Thirdly, performs the parameters calculation , which one provides a spectral representation of the voice signal features that can be used to train the recognition system (HMM, neural networks, among others). The most common methods in this stage are the filter bank analysis and LPC. To calculate the coefficients that characterize the signal, the system follows a pattern of blocks standardized by ETSI.
Types of Speech Recognition
The voice recognition systems can be divided into several classes, categorized by the description of the different types of expressions that have the ability to recognize. These classes are based on the fact that one of the difficulties of ASR is the ability to determine when a speaker starts and finishes speaking. Below are some of this types:
- Isolated word recognizers, usually require a statement delimited (the lack of an audio signal) to both sides of the sample window. This doesn't mean that accepts only one word, but require a single expression each time. Often, these systems have states "Listen / Not-Listen", which requires to the speaker keep a waiting time between words.
- Connected word systems ("expressions connected ') are similar to isolated words, but in contrast it also allows expressions separately with a minimum silence between them.
- Continuous recognition is the most difficult to create because they must use special methods to determine the emission limits. Continuous speech recognizers allow users to speak almost naturally, while the computer determines the meaning.
- Spontaneous speech, there are a variety of definitions for this topic, can be considered as a natural speech sounds and unrehearsed. An ASR system with this ability should be able to handle a variety of natural language features.
- Verification / identification voice, some systems of automatic speech recognition have the ability to identify specific users. This kind of recognition is mainly based on specific features extracted from the subject to verify or identify. Features as the signal amplitude, frequency and the cepstral coefficients from the Mel scale.
Challenges
Natural language interfaces have in the past led users to anthropomorphize the computer, or at least to attribute more intelligence than is warranted to it. This leads to unrealistic expectations of the capabilities of the system on the part of the user. Such expectations will make it difficult to learn the restrictions of the system if they attribute too much capability to it, and they will lead to disappointment when the system fails to perform as expected.
A 1995 paper titled 'Natural Language Interfaces to Databases – An Introduction', describes some challenges:[2]
The request “List all employees in the company with a driving licence” is ambiguous unless you know companies can't have drivers licences.
- Conjunction and disjunction
“List all applicants who live in California and Arizona.” is ambiguous unless you know that a person can't live in two places at once.
- resolve what a user means by 'he', 'she' or 'it', in a self-referential query.
Other goals to consider more generally are the speed and efficiency of the interface, in all algorithms these two points are the main point that will determine if some techniques are better than others and therefore have greater success in the market.
Finally, regard to the techniques used, the main problem to solve is create a general algorithm that can recognize all kinds of voices, without distinction between nationality, gender or age. Because can be significant differences between the extracted features from several speakers who says the same word or phrase.
Uses and applications
The natural language interface and his recognition with satisfactory results, give rise to this technology to be used for different uses and applications. Some of the main uses are:
- Dictation, is the most common use for ASR systems today. This includes medical transcriptions, legal and business dictation, as well as general word processing. In some cases special vocabularies are used to increase the accuracy of the system.
- Command and Control, ASR systems that are designed to perform functions and actions on the system are defined as Command and Control systems. Utterances like "Open Netscape" and "Start a new xterm" will do just that.
- Telephony, some PBX/Voice Mail systems allow callers to speak commands instead of pressing buttons to send specific tones.
- Wearables, because inputs are limited for wearable devices, speaking is a natural possibility.
- Medical/Disabilities, many people have difficulty typing due to physical limitations such as repetitive strain injuries (RSI), muscular dystrophy, and many others. For example, people with difficulty hearing could use a system connected to their telephone to convert the caller's speech to text.
- Embedded Applications, some new cellular phones include C&C speech recognition that allow utterances such as "Call Home". This could be a major factor in the future of ASR and Linux.
Below are named and defined some of the applications that use natural language recognition, and so have integrated utilities listed above.
Ubiquity
Ubiquity, an add-on for Mozilla Firefox, is a collection of quick and easy natural-language-derived commands that act as mashups of web services, thus allowing users to get information and relate it to current and other webpages.
Wolfram Alpha
Main article:
Wolfram Alpha
Wolfram Alpha is an online service that answers factual queries directly by computing the answer from structured data, rather than providing a list of documents or web pages that might contain the answer as a search engine would.[5] It was announced in March 2009 by Stephen Wolfram, and was released to the public on May 15, 2009.[6]
Siri
Siri is a personal assistant application for the iPhone OS. The application uses natural language processing to answer questions and make recommendations. The iPhone app is the first public product by its makers, who are focused on artificial intelligence applications.
Siri's marketing claims include that Siri adapts to the user's individual preferences over time and personalizes results, as well as accomplishing tasks such as making dinner reservations while trying to catch a cab.[7]
Others
- Anboto Group provides Web Customer Service and e-Commerce technology based on Semantics and Natural Language Processing. The main offer of Anboto Group are the Virtual Sales Agent and Intelligent Chat.
- Q-go - The Q-go technology provides relevant answers to users in response to queries on a company’s internet website or corporate intranet, formulated in natural sentences or keyword input alike. Q-go was acquired by RightNow Technologies in 2011
- Ask.com - The original idea behind Ask Jeeves (Ask.com) was to allow users to get answers to questions posed in everyday, natural language, as well as traditional keyword searching. The current Ask.com still supports this, with added support for math, dictionary, and conversion questions.
- C-Phrase - C-Phrase is a web-based natural language front end to relational databases. C-Phrase runs under LINUX, connects with PostgreSQL databases via ODBC and supports both select queries as well as updates. Currently there is only support for English. C-Phrase is hosted on Google Code site.
- GNOME Do - Allows for quick finding miscellaneous artifacts of GNOME environment (applications, Evolution and Pidgin contacts, Firefox bookmarks, Rhythmbox artists and albums, and so on) and execute the basic actions on them (launch, open, email, chat, play, etc.).[8]
- Braina Project - Braina is a natural language user interface software which is currently under developmental stage. It is being developed by a single programmer named Akash Shastri. The main goal of this project is to make computer understand the human language so that a user can control a computer without use of any commands.
- Brainboost — No longer available
- hakia - hakia is an Internet search engine. The company has invented an alternative new infrastructure to indexing that uses SemanticRank algorithm, a solution mix from the disciplines of ontological semantics, fuzzy logic, computational linguistics, and mathematics.
- Lexxe - Lexxe is an Internet search engine that uses natural language processing for queries (semantic search). Searches can be made with questions, such as "How old is Wikipedia?", as well as keywords and phrases. When it comes to facts, Lexxe is quite effective, though needs much improvement in natural language analysis in the area of facts and in other areas.
- Pikimal - Pikimal uses natural language tied to user preference to make search recommendations by template.
- Powerset — On May 11, 2008, the company unveiled a tool for searching a fixed subset of Wikipedia using conversational phrases rather than keywords.[9] On July 1, 2008, it was purchased by Microsoft.[10]
- START (MIT project) - START, Web-based question answering system. Unlike information retrieval systems such as search engines, START aims to supply users with "just the right information," instead of merely providing a list of hits. Currently, the system can answer millions of English questions about places, movies, people and dictionary definitions.
- Swingly - Swingly is an answer engine designed to find exact answers to factual questions. Just ask a question in plain English - and Swingly will find you the answer (or answers) you're looking for (according to their site).
- Yebol - Yebol is a vertical "decision" search engine that had developed a knowledge-based, semantic search platform. Yebol's artificial intelligence human intelligence-infused algorithms automatically cluster and categorize search results, web sites, pages and contents that it presents in a visually indexed format that is more aligned with initial human intent. Yebol uses association, ranking and clustering algorithms to analyze related keywords or web pages. Yebol integrates natural language processing, metasynthetic-engineered open complex systems, and machine algorithms with human knowledge for each query to establish a web directory that actually 'learns,' using correlation, clustering and classification algorithms to automatically generate the knowledge query, which is retained and regenerated forward.[11]
- Inbenta - Inbenta's Search Engine is a multilingual, scalable, linguistic, and semantic-based search engine for the enterprise. It is based on the latest developments of the Meaning-Text Theory and provides intuitive search experiences using natural language.
- Mnemoo - Mnemoo is an answer engine that aimed to directly answer questions posed in plain text (Natural Language), which is accomplished using a database of facts and an inference engine.
See also
References
- ^ Hill, I. (1983). "Natural language versus computer language." In M. Sime and M. Coombs (Eds.) Designing for Human-Computer Communication. Academic Press.
- ^ a b Natural Language Interfaces to Databases – An Introduction, I. Androutsopoulos, G.D. Ritchie, P. Thanisch, Department of Artificial Intelligence, University of Edinburgh
- ^ http://liceu.uab.es/~joaquim/speech_technology/tecnol_parla/recognition/speech_recognition/reconocimiento.html#reconocimiento_tratamiento_senal
- ^ http://www.tldp.org/HOWTO/Speech-Recognition-HOWTO/
- ^ Johnson, Bobbie (2009-03-09). "British search engine 'could rival Google'". The Guardian. http://www.guardian.co.uk/technology/2009/mar/09/search-engine-google. Retrieved 2009-03-09.
- ^ "So Much for A Quiet Launch". Wolfram Alpha Blog. 2009-05-08. http://blog.wolframalpha.com/2009/05/08/so-much-for-a-quiet-launch/. Retrieved 2009-10-20.
- ^ Siri webpage
- ^ Ubuntu 10.04 Add/Remove Applications description for GNOME Do
- ^ Helft, Miguel (May 12, 2008). "Powerset Debuts With Search of Wikipedia". The New York Times. http://bits.blogs.nytimes.com/2008/05/12/powerset-debuts-with-search-of-wikipedia/.
- ^ Johnson, Mark (July 1, 2008). "Microsoft to Acquire Powerset". Powerset Blog. Archived from the original on February 25, 2009. http://web.archive.org/web/20090225064356/http://www.powerset.com/blog/articles/2008/07/01/microsoft-to-acquire-powerset.
- ^ Humphries, Matthew. "Yebol.com steps into the search market" Geek.com. 31 July 2009.
External links
Internet search
|
|
Types |
|
|
Tools |
|
|
Applications |
|
|
Protocols
and standards |
|
|
See also |
|
|
Computable knowledge
|
|
Topics and
concepts |
|
|
Proposals and
implementations |
Zairja • Ars Magna ( Ramon Llull, 1300) • An Essay towards a Real Character and a Philosophical Language ( John Wilkins, 1688) • Calculus ratiocinator & Characteristica universalis ( Gottfried Leibniz, 1700) • Dewey Decimal Classification ( Melvil Dewey, 1876) • Begriffsschrift ( Gottlob Frege, 1879) • Mundaneum ( Paul Otlet & Henri La Fontaine, 1910) • Logical atomism ( Bertrand Russell, 1918) • Tractatus Logico-Philosophicus ( Ludwig Wittgenstein, 1921) • Hilbert's program ( David Hilbert, 1920s) • Incompleteness theorem ( Kurt Gödel, 1931) • Memex ( Vannevar Bush, 1945) • Prolog (1972) • Cyc (1984) • True Knowledge ( True Knowledge Ltd., 2007) • Wolfram Alpha ( Wolfram Research, 2009) • Watson ( IBM, 2011) • Siri ( Apple, 2011)
|
|
In fiction |
|
|